Search results for "Genomic data"

showing 10 items of 23 documents

Penalized regression and clustering in high-dimensional data

The main goal of this Thesis is to describe numerous statistical techniques that deal with high-dimensional genomic data. The Thesis begins with a review of the literature on penalized regression models, with particular attention to least absolute shrinkage and selection operator (LASSO) or L1-penalty methods. L1 logistic/multinomial regression models are used for variable selection and discriminant analysis with a binary/categorical response variable. The Thesis discusses and compares several methods that are commonly utilized in genetics, and introduces new strategies to select markers according to their informative content and to discriminate clusters by offering reduced panels for popul…

High-dimensional dataQuantile regression coefficients modelingTuning parameter selectionGenomic dataLasso regressionLasso regression; High-dimensional data; Genomic data; Tuning parameter selection; Quantile regression coefficients modeling; Curves clustering;Settore SECS-S/01 - StatisticaCurves clustering

researchProduct

Whole-Genome Analyses

2014

Abstract Average nucleotide identity (ANI) was proposed almost 10 years ago as a means to compare genetic relatedness among prokaryotic strains. It was found that values around 95% corresponded to the 70% DNA–DNA hybridization cut-off value that is widely used to delineate archaeal and bacterial species. ANI calculations are one of the many aspects and approaches that can be derived from comparative genomic data and used for taxonomic purposes. Here, an overview about the impact and current usage of ANI values is given together with details of the existing user-friendly package tool, the biology-oriented software package JSpecies, which can be used to generate two types of ANI calculations …

body regionsComparative genomicsGeneticsDNA–DNA hybridizationGenomic dataIdentity (object-oriented programming)Computational biologyGenetic relatednessBiologySoftware packageGenome

researchProduct

Glomeromycotina: what is a species and why should we care?

2018

International audience; A workshop at the recent International Conference on Mycorrhiza was focused on species recognition in Glomeromycotina and parts of their basic biology that define species. The workshop was motivated by the paradigm-shifting evidence derived from genomic data for sex and for the lack of heterokaryosis, and by published exchanges in Science that were based on different species concepts and have led to differing views of dispersal and endemism in these fungi. Although a lively discussion ensued, there was general agreement that species recognition in the group is in need of more attention, and that many basic assumptions about the biology of these important fungi includ…

0106 biological sciences0301 basic medicinePhysiologyGenomic data[SDV]Life Sciences [q-bio]educationarbuscular mycorrhizal fungiclonalityPlant ScienceArbuscular mycorrhizal fungi01 natural sciences03 medical and health sciencesSpecies Specificityspecies recognitionSimilarity (psychology)Clonal reproductionsex[SDV.BV]Life Sciences [q-bio]/Vegetal BiologyEndemismGlomeromycotaPhylogenyheterokaryosisGlomeromycotina030104 developmental biologyGeographyEvolutionary biology[SDE]Environmental SciencesBiological dispersal010606 plant biology & botany

researchProduct

Functional comparison of bacteria from the human gut and closely related non-gut bacteria reveals the importance of conjugation and a paucity of moti…

2016

International audience; The human GI tract is a complex and still poorly understood environment, inhabited by one of the densest microbial communities on earth. The gut microbiota is shaped by millennia of evolution to co-exist with the host in commensal or symbiotic relationships. Members of the gut microbiota perform specific molecular functions important in the human gut environment. This can be illustrated by the presence of a highly expanded repertoire of proteins involved in carbohydrate metabolism, in phase with the large diversity of polysaccharides originating from the diet or from the host itself that can be encountered in this environment. In order to identify other bacterial fun…

0301 basic medicine[SDV]Life Sciences [q-bio]lcsh:MedicineGut floraPathology and Laboratory Medicinemedicine.disease_causeBiochemistryDatabase and Informatics MethodsRNA Ribosomal 16SMedicine and Health SciencesDNA metabolismlcsh:SciencePhylogenyProtein MetabolismClostridium BotulinumMultidisciplinarybiologyChemotaxisGastrointestinal Microbiomedigestive oral and skin physiologyHuman microbiomeGenomicsBacterial Physiological PhenomenaGenomic DatabasesAdaptation PhysiologicalBacterial PathogensNucleic acidsMedical MicrobiologyConjugation GeneticPathogensBacteroides thetaiotaomicronResearch ArticleCell PhysiologyBacterial Physiological PhenomenaResearch and Analysis MethodsBiosynthesisMicrobiologydigestive systemMicrobiology03 medical and health sciencesBacterial ProteinsGeneticsmedicineHumansMicrobial PathogensEscherichia coliClostridiumBacteria030102 biochemistry & molecular biologyGut Bacterialcsh:ROrganismsBiology and Life SciencesComputational BiologyChemotaxisCell BiologyDNAGenome Analysisbiology.organism_classificationGastrointestinal MicrobiomeCell MetabolismBiological DatabasesMetabolism030104 developmental biologyEvolutionary biologylcsh:QGenome BacterialBacteria

researchProduct

Reconfigurable Accelerator for the Word-Matching Stage of BLASTN

2013

BLAST is one of the most popular sequence analysis tools used by molecular biologists. It is designed to efficiently find similar regions between two sequences that have biological significance. However, because the size of genomic databases is growing rapidly, the computation time of BLAST, when performing a complete genomic database search, is continuously increasing. Thus, there is a clear need to accelerate this process. In this paper, we present a new approach for genomic sequence database scanning utilizing reconfigurable field programmable gate array (FPGA)-based hardware. In order to derive an efficient structure for BLASTN, we propose a reconfigurable architecture to accelerate the…

SpeedupSequence databaseHardware and ArchitectureComputer scienceSequence analysisGenomicsParallel computingElectrical and Electronic EngineeringData structureGenomic databasesSoftwareReconfigurable computingWord (computer architecture)IEEE Transactions on Very Large Scale Integration (VLSI) Systems

researchProduct

Ten millennia of hepatitis B virus evolution

2021

Hepatitis B virus (HBV) has been infecting humans for millennia and remains a global health problem, but its past diversity and dispersal routes are largely unknown. We generated HBV genomic data from 137 Eurasians and Native Americans dated between ~10,500 and ~400 years ago. We date the most recent common ancestor of all HBV lineages to between ~20,000 and 12,000 years ago, with the virus present in European and South American hunter-gatherers during the early Holocene. After the European Neolithic transition, Mesolithic HBV strains were replaced by a lineage likely disseminated by early farmers that prevailed throughout western Eurasia for ~4000 years, declining around the end of the 2nd…

Phylogeographic historyHepatitis B/history01 natural sciencesThe RepublicCommunicable Diseases EmergingGermanCommunicable Diseases Emerging/historyAgency (sociology)Science and technologyComputingMilieux_MISCELLANEOUSHistory AncientPhylogenymedia_common0303 health sciencesMultidisciplinaryAncient DNAEuropean researchvirus diseasesGenomicsHepatitis B3. Good healthEuropelanguageComputingMethodologies_DOCUMENTANDTEXTPROCESSINGChristian ministryPaleogenomic analysesAsian Continental Ancestry Group010506 paleontologyHepatitis B virusAsiaHepatitis B virus/classificationEuropean Continental Ancestry GroupLibrary scienceBiología CelularWhite PeopleMarie curieEvolution Molecular03 medical and health sciencesAmerican NativesAsian PeoplePolitical scienceGenomic datamedia_common.cataloged_instanceHumansSlovakEuropean unionAmerican Indian or Alaska Native030304 developmental biology0105 earth and related environmental sciencesGenetic VariationPaleontologyPrehistoriaA300language.human_languagedigestive system diseasesAmerican natives; Americas; Asia; Asian continental ancestry group; Communicable diseases Emerging; Europe; European continental ancestry group; Evolution molecular; Genetic variation; Genomics; Hepatitis B; Hepatitis B virus; History Ancient; Humans; Paleontology; PhylogenyAmericas

researchProduct

Reactome graph database: Efficient access to complex pathway data

2018

Reactome is a free, open-source, open-data, curated and peer-reviewed knowledgebase of biomolecular pathways. One of its main priorities is to provide easy and efficient access to its high quality curated data. At present, biological pathway databases typically store their contents in relational databases. This limits access efficiency because there are performance issues associated with queries traversing highly interconnected data. The same data in a graph database can be queried more efficiently. Here we present the rationale behind the adoption of a graph database (Neo4j) as well as the new ContentService (REST API) that provides access to these data. The Neo4j graph database and its qu…

0301 basic medicineDatabases FactualComputer scienceData managementKnowledge BasesSocial SciencesInformation Storage and RetrievalNoSQLcomputer.software_genreComputer ApplicationsDatabase and Informatics MethodsUser-Computer Interface0302 clinical medicineKnowledge extractionPsychologyDatabase Searchinglcsh:QH301-705.5Data ManagementLanguageBiological dataEcologySystems BiologyGenomicsGenomic DatabasesComputational Theory and MathematicsModeling and SimulationWeb-Based ApplicationsGraph (abstract data type)Information TechnologyResearch ArticleComputer and Information SciencesRelational databaseQuery languageResearch and Analysis MethodsEcosystems03 medical and health sciencesCellular and Molecular NeuroscienceDatabasesGeneticsComputer GraphicsHumansMolecular BiologyEcology Evolution Behavior and SystematicsInternetInformation retrievalGraph databasebusiness.industryEcology and Environmental SciencesCognitive PsychologyBiology and Life SciencesComputational BiologyGenome AnalysisRelational Databases030104 developmental biologyBiological Databaseslcsh:Biology (General)Cognitive Sciencebusinesscomputer030217 neurology & neurosurgerySoftwareNeurosciencePLoS Computational Biology

researchProduct

According to the CPLL proteome sheriffs, not all aperitifs are created equal!

2014

Combinatorial peptide ligand libraries (CPLLs) have been adopted for investigating the proteome of a popular aperitif in Northern Italy, called "Amaro Branzi", stated to be an infusion of a secret herbal mixture, of which some ingredients are declared on the label, namely Angelica officinalis, Gentiana lutea and orange peel, sweetened by a final addition of honey. In order to assess the genuineness of this commercial liqueur, we have prepared extracts of the three vegetable ingredients, assessed their proteomes, and compared them to the one found in the aperitif. The amaro's proteome was identified via prior capture with CPLLs at two different pH values (2.2 and 4.8). Via mass spectrometry …

ProteomeGenomic dataBiophysicsOrange (colour)BiochemistryAnalytical ChemistryGentiana luteaPeptide LibraryHumansGentianaAngelica officinalis; Aperitifs; Combinatorial peptide ligand libraries; Gentiana lutea; Low abundance proteome; Mass spectrometry; Alcoholic Beverages; Angelica; Citrus sinensis; Fruit; Gentiana; Honey; Humans; Hydrogen-Ion Concentration; Mass Spectrometry; Peptide Library; Plant Extracts; Plant Proteins; Proteome; Biochemistry; Biophysics; Analytical Chemistry; Molecular BiologyLow abundance proteomeMolecular BiologyAngelicaPlant ProteinsChromatographybiologyMass spectrometryPlant ExtractsAlcoholic BeveragesHoneyHydrogen-Ion Concentrationbiology.organism_classificationNorthern italyAperitifsFruitOfficinalisProteomeAngelica officinalisGentiana luteaCombinatorial peptide ligand librariesCitrus × sinensisGentianaCitrus sinensis

researchProduct

Detection of batch effects in liquid chromatography-mass spectrometry metabolomic data using guided principal component analysis.

2014

Metabolomics based on liquid chromatography-mass spectrometry (LC-MS) is a powerful tool for studying dynamic responses of biological systems to different physiological or pathological conditions. Differences in the instrumental response within and between batches introduce unwanted and uncontrolled data variation that should be removed to extract useful information. This work exploits a recently developed method for the identification of batch effects in high throughput genomic data based on the calculation of a delta statistic through principal component analysis (PCA) and guided PCA. Its applicability to LC-MS metabolomic data was tested on two real examples. The first example involved t…

Quality ControlPrincipal Component AnalysisChromatographyChemistryGenomic dataGuided principal component analysisMass spectrometryBatch effectMass SpectrometryAnalytical ChemistryData setPlasmaMetabolomicsLiquid chromatography–mass spectrometryPeak intensityPrincipal component analysisCalibrationLiquid chromatography-mass spectrometry (LC-MS)HumansMetabolomicsBiological systemStatisticChromatography LiquidTalanta

researchProduct

Genomic Databases Characteristics

2013

Genomic databases

researchProduct